Quantifying Defensive Impact in Football

A Data-Driven Approach Using the NFL Big Data Bowl Dataset and Advanced Machine Learning Techniques

Dusty Turner

Understanding Kaggle

What is Kaggle?

  • Online platform for data science and machine learning
  • Founded in 2010, subsidiary of Google LLC
  • Global community of data scientists and machine learning practitioners

Key Features of Kaggle

  • Competitions: Solve real-world problems, win prizes
  • Datasets: Access a vast repository of free datasets
  • Kernels: Write and execute Python/R code, share work
  • Community: Share insights and collaborate globally

This year’s competition offers up a general goal — create metrics that assign value to elements of tackling.

Research Question: Can we determine each defensive player’s probability that they make a tackle on each play on the football field?

Ultimately: Assign a ‘tackles over expected’ value for each player

Data Available

Lots of Plays

weekId Games Plays
weekId1 16 1247
weekId2 16 1198
weekId3 16 1259
weekId4 16 1219
weekId5 16 1260
weekId6 14 1123
weekId7 14 1122
weekId8 15 1220
weekId9 13 1014

Every Play, Every Week

display_name jersey_number club n week
Dak Prescott 4 DAL 49 week_1
Ezekiel Elliott 21 DAL 49 week_1
Tyler Biadasz 63 DAL 49 week_1
Connor McGovern 66 DAL 49 week_1
Zack Martin 70 DAL 49 week_1
Tyler Smith 73 DAL 49 week_1
Terence Steele 78 DAL 49 week_1
Noah Brown 85 DAL 49 week_1
Dalton Schultz 86 DAL 49 week_1
Jake Ferguson 87 DAL 49 week_1
CeeDee Lamb 88 DAL 49 week_1
Joe Tryon 9 TB 49 week_1
Carlton Davis 24 TB 49 week_1
Antoine Winfield 31 TB 49 week_1
Mike Edwards 32 TB 49 week_1
Jamel Dean 35 TB 49 week_1
Devin White 45 TB 49 week_1
Vita Vea 50 TB 49 week_1
Lavonte David 54 TB 49 week_1
Shaquil Barrett 58 TB 49 week_1
William Gholston 92 TB 49 week_1
Akiem Hicks 96 TB 49 week_1
football NA football 49 week_1

Player Location Data Every .1 Seconds

Dak Prescott
frame_id time x y s a dis o dir
11 2022-09-11 20:24:18 90.17 24.22 1.11 5.70 0.09 253.24 62.90
12 2022-09-11 20:24:18.1 90.32 24.30 1.89 6.32 0.17 255.27 61.79
13 2022-09-11 20:24:18.2 90.54 24.43 2.72 6.43 0.26 255.27 61.40
14 2022-09-11 20:24:18.3 90.82 24.57 3.39 5.93 0.31 258.85 61.52
15 2022-09-11 20:24:18.4 91.15 24.74 3.97 5.17 0.37 266.36 61.79
16 2022-09-11 20:24:18.5 91.52 24.93 4.44 4.22 0.42 295.01 62.78
17 2022-09-11 20:24:18.6 91.94 25.12 4.78 3.17 0.46 346.41 64.53
18 2022-09-11 20:24:18.7 92.39 25.32 5.01 2.37 0.49 32.44 66.77
19 2022-09-11 20:24:18.8 92.86 25.50 5.12 1.90 0.50 42.71 69.27
20 2022-09-11 20:24:18.9 93.34 25.66 5.06 2.20 0.51 49.75 73.18

Data Available

Player & Game Identifiers

  • Game and Play IDs: Unique identifiers for games and individual plays
  • Player Information: Names, jersey numbers, team, position, physical attributes, college

In-Game Player Movements

  • Spatial Data: Player positions, movement direction, speed, and orientation
  • Time and Motion: Specific moments in play, distance covered

Detailed Play Information

  • Play Attributes: Description, quarter, down, yards needed
  • Team & Field Position: Possessing team, defensive team, yardline positions

Scoring and Game Probabilities

  • Scores & Results: Pre-snap scores, play outcomes
  • Probabilities: Win probabilities for home and visitor teams
  • Expected Points: Points added or expected by play outcomes

Tackles, Penalties, and Formations

  • Tackles & Fouls: Tackles, assists, fouls committed, and missed tackles
  • Ball Carrier Info: Identifiers and names of ball carriers
  • Team Formations: Offensive formations and number of defenders

event n
NA 1,277,171
first_contact 28,773
tackle 26,928
ball_snap 16,415
pass_outcome_caught 15,870
handoff 15,364
pass_arrived 13,915
out_of_bounds 5,037
run 2,737
man_in_motion 1,288
play_action 1,035
touchdown 1,012
fumble 621
shift 368
qb_slide 350
pass_forward 248
line_set 46
snap_direct 46
lateral 45
autoevent_ballsnap 30
fumble_defense_recovered 23
fumble_offense_recovered 23
pass_shovel 23
qb_sack 23
run_pass_option 23
autoevent_passinterrupted 16
autoevent_passforward 9

Created Features

Example Play

Distance to the ball

Speed Vector Similarity

Projected Movement

Orientation Fan

Projected Movement with Orientation Fan

Projected Movement to The Ball

Positions

Start Points

Alignment Clusters

Literature Review

Previous NFL Big Data Bowl Competitions

  • 2020: How many yards will an NFL player gain after receiving a handoff?
  • 2021: Evaluate defensive performance on passing plays
  • 2022: Evaluate special teams performance
  • 2023: Evaluate linemen on pass plays

Analysis

Modeling

  • Penalized Regression
  • Random Forest
  • XGBoost
  • Neural Network

Cross Validation

  • 70/30 Split
  • May need more compute power

Points Above or Below Expected



\(\sum_{i=1}^{N} (\mathbb{I}_{\text{tackle}_i} - P(\text{tackle}_i))\)

Where:

  1. \(N\) is the total number of plays
  2. \(P(\text{tackle}_i)\) is the probability of a tackle on play \(i\)
  3. \(\mathbb{I}_{\text{tackle}_i}\) is the indicator function which is 1 if a tackle occurred on play \(i\) and 0 otherwise

Next Steps

  1. Model Creation: Enhance the predictive algorithms by integrating advanced analytics and machine learning techniques.
  2. Model Tuning: Optimize model parameters through validation processes to improve accuracy and reliability.
  3. Feature Development: Innovate and engineer new features to capture the dynamic aspects of player actions and game events for real-time analysis.

Thanks